Introduction

Stefan Jünger & Dennis Abel

2025-04-09

Goal of this course

This course will teach you how to exploit R and apply its geospatial techniques in a social science context.

By the end of this course, you should…

  • Be comfortable with using geospatial data in R
  • Including importing, wrangling, and exploring geospatial data
  • Be able to create maps based on your very own processed geospatial data in R
  • Feel prepared for (your first steps in) spatial analysis

Illustration by Allison Horst

We are (necessarily) selective

There’s a multitude of spatial R packages - We cannot cover all of them - and we cannot cover all functions - You may have used some we are not familiar with

We will show the use of packages we exploit in practice - There’s always another way of doing things in R - Don’t hesitate to bring up your solutions

You can’t learn everything at once, but you also don’t have to!

Prerequisites for this course

  • Knowledge of R, its syntax, and internal logic
    • Affinity for using script-based languages
    • Don’t be scared to wrangle data with complex structures
  • Working versions of R (and Rstudio) on your computer
    • Ideally, with the packages installed, we asked you upfront

About us

  • Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in social sciences, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Social inequalities
    • Attitudes towards minorities
    • Environmental attitudes
    • Reproducible research

About us

Dennis

  • Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in social sciences, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Social inequalities
    • Attitudes towards minorities
    • Environmental attitudes
    • Reproducible research

About you

  • What’s your name?
  • Where do you work/research?
  • What are you working on/researching?
  • What is your experience with R or other programming languages?
  • Do you already have experience with geospatial data?

Course schedule

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping I
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Linking & Analysis
April 10 09:00-10:30 Mapping II
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Autocorrelation
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping I
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Linking & Analysis
April 10 09:00-10:30 Mapping II
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Autocorrelation
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

Why?

A lot of (classic) theories inherently make use of space (e.g., Allport 19541)

  • It’s where people interact
  • It’s what people collectivly shape
  • Space becomes place

Thus, there’s a deep intersection or even embededness of space in social science research

  • It’s what geographers call “human-environment-system”
  • But often these links are even only implicit in our data

Geographic information in social science research

Exploiting geographic information is not new.

For example, Siegfried (19131) used soil composition information to explain election results in France.

Remember the Chicago School?

Today

So many studies still rely on these ideas but incorperate space directly, e.g.,

  • Iyer, A., & Pryce, G. (2023). Theorising the causal impacts of social frontiers: The social and psychological implications of discontinuities in the geography of residential mix. Urban Studies, https://doi.org/10.1177/00420980231194834
  • Kent, J. (2022). Can urban fabric encourage tolerance? Evidence that the structure of cities influences attitudes toward migrants in Europe. Cities, 121, 103494. https://doi.org/10.1016/j.cities.2021.103494
  • Schmidt, K., Jacobsen, J., & Iglauer, T. (2023). Proximity to refugee accommodations does not affect locals’ attitudes toward refugees: Evidence from Germany. European Sociological Review, jcad028. https://doi.org/10.1093/esr/jcad028
  • Xu, A. Z. (2023). Segregation and the Spatial Externalities of Inequality: A Theory of Interdependence and Public Goods in Cities. American Political Science Review, 1–18. https://doi.org/10.1017/S0003055423000722
  • Jünger, S., & Schaeffer, M. (2023). Ethnic Diversity and Social Integration—What are the Consequences of Ethnic Residential Boundaries and Halos for Social Integration in Germany? KZfSS Kölner Zeitschrift Für Soziologie Und Sozialpsychologie. https://doi.org/10.1007/s11577-023-00888-1

Data landscape

Increased amount of available data

  • Quantitative and on a small spatial scale

Better tools

  • Personal computers with enough horsepower
  • Standard software, such as R, can be used as Geographic Information System (GIS)

What are geospatial data?

Data with a direct spatial reference

\(\rightarrow\) geo-coordinates x, y (and z)

  • Information about geometries
  • Optional: Content in relation to the geometries

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Geospatial data in this course I

In the folder called ./data, you can find (most of) the data files prepped for all the exercises and slides. The following data are included:

Geospatial data in this course II

Please make sure that if you reuse any of the provided data to cite the original data sources.

What is GIS?

Most common understanding: Geographic Information Systems (GIS) as specific software to process geospatial data for

  • Visualization
  • Analysis
  • Interpretation

Data specifics

Sources: OpenStreetMap / GEOFABRIK (2018) and City of Cologne (2014)

Formats

  • Vector data (points, lines, polygons)
  • Raster data (grids)

Coordinate reference systems

  • Allow the projection on earth’s surface
  • Differ in precision for specific purposes

Layers Must Match!

EPSG:3857

EPSG:3035

Source: Statistical Office of the European Union Eurostat (2018) / Jünger, 2019

Types of CRS

Differentiating between different CRS is wild (at least for me…). You may hear from geographic, geocentric, projected, or local CRS in your research.

What’s the difference?

  • whether 2 dimensional (longitude, latitude) or 3 dimensional (+height) coordinates are used
  • the location of the coordinate system’s origin (center of earth or not)
  • projection on a flat surface (transformation of longitudes and latitudes to x and y coordinates)
  • location (the smaller, the more precise are projections)

In practice, you shouldn’t worry too much about CRS. Again, what matters is that they match.

Old standard: PROJ.4 strings

This is how your information about the CRS are defined in a classic standard:

+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Source: https://epsg.io/3035

(It’s nothing you would type by hand)

New kid in town: WKT (“Well Known Text”)


PROJCS["ETRS89 / LAEA Europe",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6258"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4258"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",52],
    PARAMETER["longitude_of_center",10],
    PARAMETER["false_easting",4321000],
    PARAMETER["false_northing",3210000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","3035"]]

Source: https://epsg.io/3035

EPSG Codes

Eventually, it’s not as challenging to work with CRS in R as it may seem

  • we don’t have to use PROJ.4 or WKT strings directly

Most of the times it’s enough to use so-called EPSG Codes (“European Petroleum Survey Group Geodesy”)

  • Small digit sequence

More details on geospatial data

Let’s learn about geospatial data as we learn about specific formats

Source

Packages in this course I

We will use plenty of different packages during the course, but only a few are our main drivers (e.g., the sf package). Here’s the list of packages:

Packages in this course II

Exercise 1_1_1: Package Installation

Exercise

Solution